IRIX Base Documentation 2002 November

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 2002 November / SGI IRIX Base Documentation 2002 November.iso / usr / share / catman / p_man / cat3 / SCSL / intro_cblas.z / intro_cblas

Wrap

Text File | 2002-10-03 | 19.7 KB | 397 lines

IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) NNNNAAAAMMMMEEEE IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS - Introduction to the C interface to Fortran 77 Basic Linear Algebra Subprograms (legacy BLAS) IIIIMMMMPPPPLLLLEEEEMMMMEEEENNNNTTTTAAAATTTTIIIIOOOONNNN See individual man pages for operating system and hardware availability. DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN The SCSL Scientific Library provides two C/C++ interfaces to the Fortran 77 Basic Linear Algebra Subprograms (legacy BLAS). This man page describes a C interface proposed by the Basic Linear Algebra Subprograms Technical (BLAST) Forum as well as several SCSL extensions to that standard. An alternative C/C++ interface, similar to that implemented for the SCSL signal processing library, is described in individual BLAS man pages. HHHHeeeeaaaaddddeeeerrrr FFFFiiiilllleeeessss To use the CBLAS interface, a program must include the header file ccccbbbbllllaaaassss....hhhh: #include <cblas.h> For compatibility with SCSL releases prior to version 1.3, the ssssccccssssllll____ccccbbbbllllaaaassss....hhhh header file may be used instead of ccccbbbbllllaaaassss....hhhh. NNNNaaaammmmiiiinnnngggg CCCCoooonnnnvvvveeeennnnttttiiiioooonnnnssss Names of the CBLAS routines are obtained from their legacy BLAS counterparts by prefixing the name with ccccbbbbllllaaaassss____ and converting to lower case. For example, the routine DDDDGGGGEEEEMMMMMMMM becomes ccccbbbbllllaaaassss____ddddggggeeeemmmmmmmm. CCCChhhhaaaarrrraaaacccctttteeeerrrr AAAArrrrgggguuuummmmeeeennnnttttssss Arguments which were characters in the Fortran 77 interface are handled by enumerated types in the CBLAS interface, as shown in the following table. Fortran interface CBLAS interface Character Argument Value Enumerated type Value SIDE 'L' CBLAS_SIDE CblasLeft 'R' CblasRight UPLO 'U' CBLAS_UPLO CblasUpper 'L' CblasLower DIAG 'N' CBLAS_DIAG CblasNonUnit 'U' CblasUnit TRANSPOSE 'N' CBLAS_TRANSPOSE CblasNoTrans 'T' CblasTrans 'C' CblasConjTrans PPPPaaaaggggeeee 1111 IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) CBLAS_ORDER CblasRowMajor CblasColMajor The last enumerated type listed above, CCCCBBBBLLLLAAAASSSS____OOOORRRRDDDDEEEERRRR, has no Fortran counterpart. It is used as an additional argument to all routines involving two-dimensional arrays, as discussed in the following section. AAAArrrrrrrraaaayyyy AAAArrrrgggguuuummmmeeeennnnttttssss Array elements are required to be contiguous in memory. All legacy BLAS routines which take one or more two-dimensional arrays as arguments have an extra argument in the CBLAS interface. First in the argument list, this parameter is of the enumerated type: enum CBLAS_ORDER {CblasRowMajor=101, CblasColMajor=102}; CCCCbbbbllllaaaassssRRRRoooowwwwMMMMaaaajjjjoooorrrr indicates that elements within a row of the array(s) are contiguous in memory while elements within array columns are offset by a constant stride. The stride parameter is equivalent to the leading dimension (LDA) in the Fortran 77 interface. Similarly, CCCCbbbbllllaaaassssCCCCoooollllMMMMaaaajjjjoooorrrr indicates that elements within a column of the array(s) are contiguous in memory while elements within array rows are offset by a constant stride. The CCCCBBBBLLLLAAAASSSS____OOOORRRRDDDDEEEERRRR parameter applies to all array operands in a routine. CCCCoooommmmpppplllleeeexxxx DDDDaaaattttaaaa TTTTyyyyppppeeeessss The BLAST standard does not define a complex data type for use in routines having complex arguments. Instead, all complex scalars and arrays are prototyped as vvvvooooiiiidddd ****. This has the advantage of allowing the use of any complex data structure without warnings from the compiler, provided that the structure meets the specifications described below. The disadvantage, however, is that the compiler will not catch type mismatches. Any C/C++ complex data type used in conjunction with the CBLAS interface must satisfy the following requirements: 1. The real and imaginary components must be contiguous in memory. 2. Sequential array elements must also be contiguous in memory. As an extension to the BLAST standard, SCSL provides support for stronger type checking for complex arguments. To enable this, define SSSSCCCCSSSSLLLL____NNNNOOOO____VVVVOOOOIIIIDDDD____AAAARRRRGGGGSSSS before including the CBLAS header file (for example, at compile time with ----DDDDSSSSCCCCSSSSLLLL____NNNNOOOO____VVVVOOOOIIIIDDDD____AAAARRRRGGGGSSSS as an argument or with an explicit ####ddddeeeeffffiiiinnnneeee SSSSCCCCSSSSLLLL____NNNNOOOO____VVVVOOOOIIIIDDDD____AAAARRRRGGGGSSSS in the source code). With this definition, the default behavior is as follows: PPPPaaaaggggeeee 2222 IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) * For C++ code in which the complex standard template library (STL) is used, single precision complex arguments are prototyped aaaassss ccccoooommmmpppplllleeeexxxx<<<<ffffllllooooaaaatttt>>>> **** and double precision complex arguments are prototyped as ccccoooommmmpppplllleeeexxxx<<<<ddddoooouuuubbbblllleeee>>>> ****. * Otherwise, single precision complex arguments are prototyped as ssssccccssssllll____ccccoooommmmpppplllleeeexxxx **** and double precision complex arguments are prototyped as ssssccccssssllll____zzzzoooommmmpppplllleeeexxxx **** for both C and C++. The SCSL complex types are defined as follows: typedef struct { float re; float im; } scsl_complex; typedef struct { double re; double im; } scsl_zomplex; Strong type checking also can be enabled in programs employing their own (non-SCSL, non-C++ STL) complex types. To do this, define SSSSCCCCSSSSLLLL____UUUUSSSSEEEERRRR____CCCCOOOOMMMMPPPPLLLLEEEEXXXX____TTTT====_m_y__c_o_m_p_l_e_x and SSSSCCCCSSSSLLLL____UUUUSSSSEEEERRRR____CCCCOOOOMMMMPPPPLLLLEEEEXXXX____TTTT====_m_y__z_o_m_p_l_e_x, where _m_y__c_o_m_p_l_e_x and _m_y__z_o_m_p_l_e_x are the names of user-defined complex types. These complex types, as well as SSSSCCCCSSSSLLLL____NNNNOOOO____VVVVOOOOIIIIDDDD____AAAARRRRGGGGSSSS, must be defined before including the CBLAS header file (see Example 5 later in this man page). RRRRoooouuuuttttiiiinnnneeeessss tttthhhhaaaatttt RRRReeeettttuuuurrrrnnnn IIIInnnnddddiiiicccceeeessss Following the array indexing convention of Fortran 77, the legacy BLAS return indices in the range 1 <= _i <= _n, where _n is the number of entries and _i is the index. This allows the returned indices to be used to index standard arrays directly. The C interface therefore returns indices in the range 0 <= _i < _n for the same reason. Functions that return an index are IIII[[[[SSSSDDDDCCCCZZZZ]]]]AAAAMMMMAAAAXXXX, IIII[[[[SSSSDDDDCCCCZZZZ]]]]AAAAMMMMIIIINNNN, IIII[[[[SSSSDDDD]]]]MMMMAAAAXXXX and IIII[[[[SSSSDDDD]]]]MMMMIIIINNNN, which are declared to be of type CCCCBBBBLLLLAAAASSSS____IIIINNNNDDDDEEEEXXXX. RRRRoooouuuuttttiiiinnnneeeessss tttthhhhaaaatttt RRRReeeettttuuuurrrrnnnn CCCCoooommmmpppplllleeeexxxx VVVVaaaalllluuuueeeessss For each routine returning a complex value ([[[[CCCCZZZZ]]]]DDDDOOOOTTTTCCCC, [[[[CCCCZZZZ]]]]DDDDOOOOTTTTUUUU, [[[[CCCCZZZZ]]]]SSSSUUUUMMMM) the BLAST standard defines a subroutine that returns a pointer to the result as the last parameter of the argument list. All other arguments are otherwise the same. The name of the subroutine is obtained by appending ____ssssuuuubbbb to the CCCCBBBBLLLLAAAASSSS name; for example, CCCCDDDDOOOOTTTTCCCC becomes ccccbbbbllllaaaassss____ccccddddoooottttcccc____ssssuuuubbbb. In the SCSL implementation complex functions can be called directly provided that SSSSCCCCSSSSLLLL____NNNNOOOO____VVVVOOOOIIIIDDDD____AAAARRRRGGGGSSSS is defined, in which case the function returns a structure of the appropriate type. The function naming and calling conventions are the same as those for real functions (i.e., ____ssssuuuubbbb is not appended to the name, and no extra parameter is added). OOOOtttthhhheeeerrrr IIIInnnntttteeeerrrrffffaaaacccceeee NNNNooootttteeeessss Input-only arguments are declared with the ccccoooonnnnsssstttt modifier. Non-complex scalar input arguments are passed by value. This allows the user to put in constants when desired. PPPPaaaaggggeeee 3333 IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) Array arguments are passed by address. Output scalar arguments are passed by address. The CBLAS routines can be loaded at compile time using either the ----llllssssccccssss or the ----llllssssccccssss____mmmmpppp option. The ----llllssssccccssss____mmmmpppp option directs the linker to use the multi-processor version of SCSL. When linking to SCSL with ----llllssssccccssss or ----llllssssccccssss____mmmmpppp, the default integer size is 4 bytes (32 bits). Another version of the library is available in which integers are 8 bytes (64 bits). This version allows the user access to larger memory sizes and helps when porting legacy Cray codes. It can be loaded by using the ----llllssssccccssss____iiii8888 option or the ----llllssssccccssss____iiii8888____mmmmpppp option. A program may use only one of the two versions; 4-byte integer and 8-byte integer library calls cannot be mixed. When using the 8-byte integer version, variables of type iiiinnnntttt become lllloooonnnngggg lllloooonnnngggg and the ccccbbbbllllaaaassss____iiii8888....hhhh header file should be included. EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS Example 1: Multiply a real 10 x 20 matrix by a real 20 x 30 matrix. Use the "natural" form for C arrays. #include <cblas.h> float a[10][20], b[20][30], c[10][30]; cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 10, 30, 20, 1.0f, a, 20, b, 30, 0.0f, c, 30); Example 2: Multiply a real 10 x 20 matrix by a real 20 x 30 matrix. Use 8-byte integers and column-major array ordering. #include <cblas_i8.h> float a[20][10], b[30][20], c[30][10]; cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, 10LL, 30LL, 20LL, 1.0f, a, 10LL, b, 20LL, c, 10LL); Examples 1 and 2 will result in a warning message when compiled as C code and an error message when compiled with as C++ code because aaaa, bbbb, and cccc are prototyped as ffffllllooooaaaatttt ****. There are several ways to avoid these problems, perhaps the easiest of which is to make explicit casts when calling the CBLAS routine: cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 10, 30, 20, 1.0f, (float *) a, 20, (float *) b, 30, 0.0f, (float *) c, 30); Another solution is to declare aaaa, bbbb, and cccc as the following: float a[10*20], b[20*30], c[10*30]; PPPPaaaaggggeeee 4444 IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) Of course, in this case two-dimensional indexing is no longer possible. For example, if we assume that bbbb has 20 rows and 30 columns, then the _ith element of the _jth column of bbbb must be referenced as bbbb[[[[iiii****33330000++++jjjj]]]] rather than bbbb[[[[iiii]]]][[[[jjjj]]]]. Note that the following is acceptable: #include <stdlib.h> float *a, *b, *c; a = (float *) malloc(10 * 20 * sizeof(float)); b = (float *) malloc(20 * 30 * sizeof(float)); c = (float *) malloc(10 * 30 * sizeof(float)); The following gives unpredictable results since the array elements are not contiguous in memory. #include <stdlib.h> float *a[10], *b[20], *c[10]; int i; for (i = 0; i < 10; i++) { a[i] = (float *) malloc(20 * sizeof(float)); b[2*i] = (float *) malloc(30 * sizeof(float)); b[2*i+1] = (float *) malloc(30 * sizeof(float)); c[i] = (float *) malloc(30 * sizeof(float)); } Example 3: Multiply a complex 10 x 20 matrix by a complex 20 x 30 matrix. Use the C++ STL and row-major ordering. #include <complex.h> #include <cblas.h> complex<float> a[10][20], b[20][30], c[10][30]; complex<float> alpha(1.0,0.0); complex<float> beta(0.0,0.0); cblas_cgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 10, 30, 20, &alpha, a, 20, b, 30, &beta, c, 30); Because complex arguments are prototyped as vvvvooooiiiidddd **** by default, the multidimensional array declarations in the example above will not result in any type mismatches at compile time. In the following strong type checking examples the complex matrices are stored in explicitly one- dimensional form. Example 4: Multiply a complex 10 x 20 matrix by a complex 20 x 30 matrix. Use the SCSL complex type and strong type checking. #define SCSL_NO_VOID_ARGS #include <cblas.h> scsl_complex a[10*20], b[20*30], c[10*30]; PPPPaaaaggggeeee 5555 IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) IIIINNNNTTTTRRRROOOO____CCCCBBBBLLLLAAAASSSS((((3333SSSS)))) scsl_complex alpha = {1.0, 0.0}; scsl_complex beta = {0.0, 0.0}; cblas_cgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 10, 30, 20, &alpha, a, 20, b, 30, &beta, c, 30); Example 5: Multiply a complex 10 x 20 matrix by a complex 20 x 30 matrix. Define your own complex type. #define SCSL_NO_VOID_ARGS #define SCSL_USER_COMPLEX_T CBLAS_COMPLEX #define SCSL_USER_COMPLEX_T CBLAS_ZOMPLEX typedef struct { float real; float imag; } CBLAS_COMPLEX; typedef struct { double real; double imag; } CBLAS_ZOMPLEX; #include <cblas.h> CBLAS_COMPLEX a[10*20], b[20*30], c[10*30]; CBLAS_COMPLEX alpha = {1.0, 0.0}; CBLAS_COMPLEX beta = {0.0, 0.0}; cblas_cgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 10, 30, 20, &alpha, a, 20, b, 30, &beta, c, 30); SSSSEEEEEEEE AAAALLLLSSSSOOOO IIIINNNNTTTTRRRROOOO____SSSSCCCCSSSSLLLL(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS1111(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS2222(3S), IIIINNNNTTTTRRRROOOO____BBBBLLLLAAAASSSS3333((((3333SSSS)))), IIIINNNNTTTTRRRROOOO____LLLLAAAAPPPPAAAACCCCKKKK((((3333SSSS)))) The working document for the Basic Linear Algebra Subprograms (BLAS) standard from the Basic Linear Algebra Subprograms Technical (BLAST) Forum is available at hhhhttttttttpppp::::////////wwwwwwwwwwww....nnnneeeettttlllliiiibbbb....oooorrrrgggg////ccccggggiiii----bbbbiiiinnnn////cccchhhheeeecccckkkkoooouuuutttt////bbbbllllaaaasssstttt////bbbbllllaaaasssstttt....ppppllll. PPPPaaaaggggeeee 6666